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Report  of  the  Progress  on  Grant  DAMD 17-96- 1-6226 


For  the  Period  October  1996  to  October  2001 


Introduction 

Mammography  is  the  most  sensitive  procedure  for  detecting  breast  cancer.  Unfortunately, 
as  currently  practiced,  the  positive  predictive  value  (PPV)  is  low.  While  between  0.5  -  2.0%  of 
all  mammographic  exams  result  in  biopsy,  only  between  70%  and  90%  of  women  who  undergo 
biopsy  for  mammographically  suspicious  non-palpable  lesions  have  no  malignancy!  1]  Each  year 
this  amounts  to  several  hundreds  of  thousands  of  biopsies  performed  on  benign  lesions.  Women 
who  undergoing  biopsy  for  a  benign  finding  are  unnecessarily  subjected  to  the  discomfort, 
expense,  potential  complications,  change  in  cosmetic  appearance,  and  anxiety  that  can 
accompany  breast  biopsy!  1-4].  The  cost  of  these  procedures  is  between  $3000  and  $5000  per 
biopsy  and  is  significant  in  the  present  political  and  economic  effort  to  reduce  expenditures,  t  In 
clinical  practice,  mammography  reporting  systems  are  typically  implemented  as  a  data  entry 
form  into  a  relational  data  base.  The  system  that  we  describe  in  this  report  can  be  easily 
integrated  into  the  mammographers’  work-flow  since  it  is  also  based  on  a  relational  database 
structure.  The  clinician  interprets  the  mammogram,  records  the  findings  using  a  standard 
reporting  lexicon  (BI-RADS™),  and  enters  these  findings  into  the  database.  All  of  this  is 
currently  the  standard  procedure.  The  database  is  searched  for  similar  cases  and  the  fraction  of 
those  similar  cases  that  were  malignant  is  returned.  In  practice,  a  threshold  is  applied  to  the 
fraction  and  if  the  fraction  is  above  the  threshold,  the  computer  aid  would  recommend  biopsy. 
The  woman’s  health  care  team  can  then  include  this  recommendation  in  the  medical  decision  for 


5 


biopsy.  The  long  term  hope  is  that  this  computer  aided  approach  may  significantly  improve  the 
delivery  of  health  care  to  these  women. 

The  focus  of  this  project  has  been  to  gather  data  from  multiple  sites  in  order  to  verify  and 
whether  the  artificial  neural  network  computer  aid  to  the  diagnosis  of  breast  cancer  can  be 
translated  between  locations.  While  the  system  has  proven  to  be  robust  and  could  in  principle  be 
trained  for  every  application  location,  much  facility  could  be  gained  if  we  could  demonstrate  that 
a  single  System  could  be  developed  and  deployed  nationally.  This  deployment  would  facilitate 
transferring  the  expertise  currently  present  in  only  a  few  tertiary  care  centers  to  the  public  at 
large  and  to  smaller  and  more  rural  settings  and  thus  improve  access  for  under-served 
populations. 


Progress 

Progress  is  demonstrated  through  the  35  publications  supported  in  part  by  this  grant. 

The  publications  included  10  peer-reviewed  journals,  15  manuscripts  in  conference  proceedings, 
and  10  conference. 

1.  Baker  JA,  Komguth  PJ,  Lo  JY,  Floyd  CE  Jr.  Artificial  Neural  Network:  Improving  the 
Quality  of  Breast  Biopsy  Recommendations.  /?fl<i/o/ogy;198;131-135;  1996. 

2.  Baker  JA,  Komguth  PK,  Floyd  CE  Jr.  Breast  Imaging  Reporting  and  Data  System 
Standardized  Mammography  Lexicon;  Observer  Variability  in  Lesion  Description  Amer.  J. 
Roent.\\e()-,113-n%\  1996. 

3.  Lo  JY,  Baker  JA,  Komguth  PJ,  Igelhart  R,  Floyd  CE  Jr.  Predicting  Breast  Cancer  Invasion 
From  BI-RADS  Mammographic  Features  Using  Artificial  Neural  Networks  On  The  Basis 
Of  Mammographic  Features.  Radiology,  203;159-163;  1997. 

4.  Tourassi  GD,  Floyd  CE  Jr.  The  Effect  of  Data  Sampling  on  the  Performance  Evaluation  of 
Artificial  Neural  Networks  in  Medical  Diagnosis.  Medical  Decision  Making,  17;1 86-192; 
1997. 

5.  Lo  JY,  Baker  JA,  Komguth  PJ,  Floyd  CE  Jr.  Effect  of  Patient  History  Data  on  the 
Prediction  of  Breast  Cancer  from  Mammographic  Findings  with  Artificial  Neural 
Networks.  Acad  Radiol,  6;10-15;  1999. 
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6.  Gavrielides  MA,  Lo  J,  Vargas- Voracek  R,  Floyd  CE  Jr.  Segmentation  of  suspicious 
clustered  microcalcificatLonL  iiL  mammograms.  MedicalPhyxics',.  27( 1):  13-22;.  2Q00.. 

7.  Floyd  C.E.,  Jr.,Lo  J.Y.,  Tourassi  G.D.,  Breast  Biopsy:  Case-Based  Reasoning  Computer- 
Aid  Using  Mammography  Findings  for  the  Decision  to  Biopsy,  Jn  press  to  American 
Journal  of  Roentgenology  (AJR)  2000. 

8.  Floyd  C.E.,  Jr.,Lo  J.Y.,  TourasSi  G.D.,  Breast  Biopsy:  Case-Based  Reasoning- Computer- 
Aid  Using  Mammography  Findings  for  the  Decision  to  Biopsy,  American  Journal  of 
Roentgenology  (AJR)  175:1-6, 2000. 

9.  Markey  M.K,  Lo  J.Y.,  Vargas-Voracek  R.,  Tourassi  G.D.,  Floyd  C.E.Jr.,  “Perceptron  Error 
Surface  Analysis:  A  Case  Study  in  .Breast  Canper  Diagnosis”,  submitted  to  .IEEE 
Transactions  in  Medical  Imaging . 

10.  Lo  JY,  Markey  MK,  Baker  JA,  and  Floyd  CE>  Jr,  "Cross-institution  evaluation  of  BI¬ 
RADS  model  for  mammographic  diagnosis  of  breast  cancer,"  submitted,  (2001), 

Conference  Proceedings: 

1.  Floyd  CE  Jr,  Yun  A,  Lo  JY,  Tourassi  GD,  Sullivan  D,  Komguth  P.  Prediction  of  Breast 
Cancer  Malignancy  for  Difficult  Cases  using  and  Artificial  Neural  Network.  In  World 
Congress  on  Neural  Networks,.  International ,  Neural  Network.  S.Qci,e1ty  Annual  Meeting 
(INNS),  1:1127-1132,  1994. 

2.  Floyd  CE  Jr,  Grissom  A,  Y un  J,  Lo  JY,  Dovan  M,  Humphrey  L,  Sullivan  DC,  Kofn^th  PJ. 
Computer-Aided  Breast  Cancer  Prediction:  Integration  of  a  Mammography  Findings 
Database  with  an  Artificial  Neural  Network.  In  Computer  Applications  to  Assist 
Radiology,  Symposium  for  Computer  Assisted  Radiology  (SCAR),  255-260, 1994. 

3 .  Floyd  CE  Jr,  Soo  MS,  Tourassi  GD,  Komguth  PJ.  Computer  aided  prediction  of  breast 
implant  mpture  based  on  mammographic  findings.  In  Proceedings  of  the  International 
Society  for  Optical  Engineering  (SPIE),  2434;  471-477, 1995. 

4.  Lo  JY,  Grissom  AT,  Floyd  CE  Jr,  Komguth  PJ.  Computer-aided  diagnosis  of 
mammograms  using  an  artificial  neural  network:  Merging  of  standardized  input  features 
from  the  ACR  lexicon.  In  Proceedings  of  the  International  Society  for  Optical 
Engineering  (SPIE),  2434;571-578;  1995. 

5.  CE  Jr,  Lo  JY,  Tourassi  GD,  Komguth  P.  Computer  aided  diagnosis  using  genetic 
algorithms  and  neural  networks.  In  World  Congress  on  Neural  Networks,  International 
Neural  Network  Society  Annual  Meeting  (INNS),  11-863-866;  1995. 

6.  Tourassi  GD,  Floyd  CE  Jr.  Sostman  HD,  Coleman  RE.  Performance  evaluation  of  an 
artificial  neural  network  for  the  diagnosis  of  pulmonary  embolism  using  the  oross- 
validation,  jackknife,  and  bootstrap  methods:  a  comparison  study.  In  World  Congress  on 
Neural  Networks,  International  Neural  Network  Society  Annual  Meeting  (INNS), 11-897- 
900; 1995. 

7.  Lo  JY,  Baker  JA,  Komguth  PJ,  Floyd  CE  Jr.  Computer-aided  diagnosis  of  mammography: 
Artificial  neural  networks  for  optimized  merging  of  standardized  BIRADS  features.  In 
World  Congress  on  Neural  Networks,  International  Neural  Network  Society  Annual 
Meeting  (INNS),  11-885-888, 1995. 
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8. 


CE  Jr,  Use  of  genetic  algorithms  for  computer-aided  diagnosis  of  breast  cancer  from  image 
features.  .Proceedings  of  the  International  Society  for  Optical  Engineering  (SPIE), 
2710;51-58;1996. 

9.  Lo  JY,  Floyd  CE  Jr,  Komguth  PJ.  Computer-aided  diagnosis  of  mammography  using  an 
artificial  neural  network:  predicting  the  invasiveness  of  breast  cancers  from  image  features. 
In  Proceedings  of  the  International  Society  for  Optical  Engineering  (SPIE),  2710:  725-732; 
1996. 

10.  Lo  JY,  and  Floyd  CE,  Jr,  "Analysis  of  error  surfaces  of  neural  network  applied  to 
computer-aided  diagnosis  in  mammography,"  World  Congress  on  Neural  Networks  '96 
(International  Neural  Network  Society  1996  Annual  Meeting),  Lawrence  Erlbaum 
Associates,  Inc.,  San  Diego,  CA,  1240  (1996). 

11.  Lo,  J.Y.  and  Floyd  CE,  Jr,  "Self-organizing  maps  for  analyzing  mammographic 
findings,"  ^  Karayiannis  MB,  Ed.,  IEEE  International  Conference  on  Neural 
Networks,  IEEE,  Houston,  TX,  4: 2472-4  (1997). 

12.  Vargas-Voracek  R,  Floyd  CE  Jr.  Computer-Aided  Diagnosis  for  Early  Detection  of  Breast 
Cancer  from  Mammograms.  Susan  G.  Komen  Breast  Cancer  Foundation  “Reaching  for  the 
Cure”  National  Grant  Conference.  (1998). 

13.  Vargas-Voracek  R,  Floyd  CE  Jr.  Markov-Random  Field  Texture  Model  for  Automatic 
Breast  Parenchyma  Characterization.  Accepted  for  presentation  at  the  84*  Radiological 
Society  of  North  America  (RSNA)  Scientific  Assembly  and  Annual  Meeting.  November 
29-December  4,  1998. 

14.  Lo  JY,  and  Floyd  CE,  Jr,  "Computer-aided  diagnosis  of  breast  cancer,"  Doi  K  et  al.,  Ed., 
First  International  Workshop  on  Computer-Aided  Diagnosis,  Elsevier  Science,  Univ.  of 
Chicago,  Chicago,  IL,  1182  (ICS  1182):  221-5  (1998). 

15.  Floyd  CE,  Jr,  Lo  JY,  and  Baker  JA,  "Prediction  of  breast  biopsy  outcomes  from 
mammographic  findings,"  Doi  K  et  al.,  Ed.,  First  International  Workshop  on 
Computer-Aided  Diagnosis,  Elsevier  Science,  Univ.  of  Chicago,  Chicago,  IL,  1182 
(ICS  1182):  193-200  (1998). 

16.  Floyd  CE,  Jr,  Lo  JY,  Tourassi,  GD,  "Case-Based  Reasoning  as  a  Computer  Aid  to 
Diagnosis,"  Medical  Imaging  1999:  Image  Processing,  Hanson  KM,  Ed.,  Proc.  SPIE, 
3661:486-489, 1999. 

17.  Vargas-Voracek  R,  Floyd  CE  Jr.  Hierarchical  Markov-Random  Field  Texture  Modeling 
for  Mammographic  Structure  Segmentation  Using  Multiple  Spatial  and  Intensity  Image 
Resolutions.  1999  Medical  Imaging  Symposium.  International  Society  for  Optical 
Engineering  (SPIE).  February  20-26, 1999. 

18.  Tourassi  GD,  Floyd  CE  Jr,  Lo  JY.  A  Constraint  Satisfaction  Neural  Network  for  Medical 
Diagnosis.  1999  International  Conference  on  Neural  Networks  (ICNN),  Washington,  DC. 

19.  Tourassi  GD,  Floyd  CE,  Jr,  and  Lo  JY,  "Use  of  constraint  satisfaction  neural  network  for 
breast  cancer  diagnosis  and  dynamic  scenarios  simulation,"  Medical  Imaging  2000:  Image 
Processing,  Hanson  KM,  Ed.,  SPIE  Medical  Imaging  2000:  Image  Processing,  Proc.  SPIE 
3979: 46-54  (2000). 
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Lo  SCB,  Ed.,  Computer-Aided  Diagnosis  Workshop,  Georgetown  University  Medical 
Center,  Washington,  DC.  22, 1994 

2.  Floyd  CE  Jr,  Lo  JY,  Baker  JA,  Komguth  PJ:  Interactive  Computer-Aided  Diagnosis  of 
Breast  Cancer.  Radiology  197P:533, 1995. 

3.  Baker  JA,  Komguth  PJ,  Floyd  CE  Jr:  Interobserver  variability  in  Radiologist's  Use  of  the 
BI-RADS  Mammography  Lexicon.  Rac/io/ogy  197P:242, 1995. 

4.  Baker  JA,  Komguth  PJ,  Lo  JY,  Floyd  CE  Jr;  Artificial  Neural  Network  for  the  Prediction 
of  Breast  Cancer  with  the  BI-RADS  Standardized  Lexicon.  Radiology  197P;242,  1995. 

5.  Lo  JY,  Baker  JA,  Komguth  PJ,  Floyd  CE  Jr:  Application  of  Artificial  Neural  Networks  to 
the  interpretation  of  Mammograms  on  the  Basis  of  the  Radiologist's  Impression  and 
Optimized  Image  Features.  Radiology  197P:242, 1995. 

6.  Lo  JY,  Baydush  AH,  Baker  JA,  Komguth  PJ,  Floyd  CE  Jr:  Computer-Aided  Diagnosis  of 
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Methods 

To  assess  how  the  proposed  systems  might  perform  in  different  health  care  delivery  settings,  we 
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foremost,  quality  data  for  these  cases  are  difficult  to  obtain.  While  there  are  a  number  of 
investigators  who  would  be  able  to  provide  the  mammographic  data  alone,  the  need  for  patient 
demographic  data  dramatically  increases  the  amount  of  research  effort  required  to  obtain  the  data 
that  we  need.  Several  of  our  original  collaborators  found  that  they  were  unable  to  support  the 
research  effort  required  with  the  funds  that  were  provided  to  us  for  this  task.  Our  initial 
estimates  of  the  financial  cost  of  providing  and  acquiring  these  cases  was  an  underestimate.  This 
is  due  in  part  to  the  rapid  evolution  of  economic  restructuring  in  major  research  medical  centers 
over  the  last  five  years.  While  the  overall  result  of  this  restructuring  on  the  medical  health  care 
economic  situation  has  been  positive,  the  impact  on  research  has  been  very  negative.  The  very 
simple  explanation  is  that  hospitals  are  no  longer  able  to  provide  a  level  of  infrastructure 
supporting  previously  afforded  to  research  activities.  The  impact  of  this  on  this  research  project 
is  that  the  acquisition  of  cases,  so  critical  to  this  project,  has  more  expensive  than  anticipated. 
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At  our  own  institution,  Duke  University,  we  have  established  an  accurate  and  efficient  procedure 
for  obtaining  the  mammographic  data,  the  pathological  data,  and  the  demographic  data.  It  is 
unfortunate  that  the  integrated  medical  radiological  information  system  that  was  scheduled  to  go 
on  line  within  the  first  year  has  yet  to  be  realized.  Nonetheless,  and  through  diligent  application 
of  old-fashioned  data  acquisition  using  paper  forms  and  hand  verification,  we  have  acquired  over 
1500  cases  that  have  been  verified  extensively.  A  preliminary  evaluation  of  the  similarities  and 
differences  between  the  data  sets  acquired  at  the  three  medical  institutions  is  presented  here. 

Over  the  last  year  we  have  performed  several  comparisons  of  a  neural  network  and  other 
classification  systems  on  these  data  sets.  Software  has  been  developed  to  facilitate  the  rapid 
organization  and  comparison  of  multiple  data  sets  and  to  facilitate  the  arrangement  of  these  data 
sets  into  training,  testing,  and  evaluation  sets.  In  an  earlier  progress  report,  we  demonstrated  that 
the  distributions  of  mammographic  findings  do  not  adhere  to  a  normal  distribution  pattern. 
Particularly,  this  is  true  given  the  relatively  small  number  of  cases  in  any  one  finding  such  as 
masses  with  a  micro-lobulated  margin.  Accepting  this  reality,  there  are  few  statistical  tests  that 
are  appropriate  to  apply  when  trying  to  describe  the  similarities  and  differences  between  the 
distributions  of  findings.  One  technique  that  is  rigorous  and  at  the  same  time  intuitively 
appealing,  is  that  of  case  matching.  With  this  technique  we  set  definitions  of  similarity  and  then 
search  for  cases  that  are  similar  between  the  two  data  sets  given  these  definitions.  The 
definitions  may  be  strict  or  maybe  lax  and  the  failure  or  success  of  the  similarity  matching  under 
these  different  criteria  can  form  the  basis  for  describing  the  similarity  of  the  two  data  sets.  This 
is  in  fact  an  implementation  of  the  artificial  intelligence  classification  technique  known  as  cases 
based  reasoning  and  serves  as  the  backbone  of  one  of  our  most  successful  CAD  systems. 
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We  implemented  a  case  based  reasoning  formalism  using  the  Microsoft  ACCESS  database 
language.  In  fact,  after  implementing  the  system  as  a  technique  for  comparing  the  databases,  we 
found  that  it  did  in  itself  make  a  very  good  classifier  .  It  is  in  this  form  that  we  have 
implemented  the  case  based  reasoning  and  applied  it  to  the  task  of  determining  similarity  or 
difference  between  the  study  databases  .  Below  we  present  results  of  this  evaluation  of  these 
data  sets  using  the  case  based  reasoning  system  under  a  reasonably  lax  matching  criteria.  The 
overall  strategy  was  to  consider  cases  from  Duke  University  as  one  set  and  cases  from  the 
University  of  Peimsylvania  as  an  distinct  set. 

The  case  based  reasoning  algorithm  is  very  simple  and  intuitive.  Case  based  reasoning  is  a 
computer  implementation  of  the  question  "of  all  the  cases  in  one  data  set,  how  many  match  a 
particular  selected  case  from  another  data  set."  To  investigate  this  question,  the  two  data  sets  are 
structured  as  tables  in  a  database  and  sequel  query  language  is  employed  to  perform  the 
matching  and  scoring.  Matching  rules  are  implemented  as  numerical  and  logical  conditions  for 
the  query  calls.  The  results  set  from  this  query  is  a  list  of  all  cases  in  the  reference  database  that 
matched  the  single  case  selected  from  the  test  data  set.  A  malignancy  ratio  is  formed  as  the  ratio 
of  all  cases  in  the  match  list  which  were  malignant  at  biopsy/the  total  number  of  cases  that 
matched.  This  process  is  repeated  for  each  case  in  the  test  data  set.  The  malignancy  ratio  is 
taken  as  a  decision  variable  and  the  R  O  C  performance  is  evaluated.  An  evaluation  of  the 
similarity  of  the  two  data  sets  may  be  obtained  by  switching  the  roles  of  the  data  sets  in  this 
process.  The  data  set  that  was  initially  used  as  the  reference  data  set  is  now  used  as  the  test  data 
set  while  the  data  set  which  was  originally  used  as  the  test  set  is  now  used  as  the  reference. 
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Comparison  of  the  two  ROC  results  forms  a  functionally  useful  test  for  similarity.  The  goal  of 
this  evaluation  was  into  determined  if,  as  used  in  the  computer  aided  prediction  models,  the  two 
data  sets  were  equivalent. 

Results 

The  ROC  plot  shown  in  Fig.  1  ,  demonstrates  the  performance  for  predicting  the  outcome  of 
biopsy  when  the  Duke  data  set  is  used  as  the  testing  set.  Results  for  two  different  reference 
databases  are  plotted.  The  solid  line  shows  the  results  when  the  Duke  data  set  is  used  as  the 
reference  database  while  the  dashed  line  shows  the  performance  when  the  Penn  data  set  is  used 
as  the  reference  database.  While  there  is  almost  no  difference  in  the  area  under  the  two  curves, 
the  model  that  used  the  Duke  data  set  as  the  reference  database  has  higher  performance  in  the 
region  of  high  sensitivity  which  is  where  the  system  would  operate  in  a  clinical  application. 

The  ROC  plot  shown  in  Fig.  2  ,  demonstrates  the  performance  for  predicting  the  outcome  of 
biopsy  when  the  Penn  data  set  is  used  as  the  testing  set.  Results  for  the  two  different  reference 
databases  again  are  plotted.  The  solid  line  shows  the  results  when  the  Duke  data  set  is  used  as  the 
reference  database  while  the  dashed  line  shows  the  performance  when  the  Penn  data  set  is  used 
as  the  reference  database.  The  model  using  the  Duke  reference  database  shows  higher 
performance  for  sensitivities  between  80  and  90%,  but  each  reference  database  provides  equally 
good  performance  for  sensitivities  from  90%  to  100%. 

Both  the  Duke  and  Perm  data  have  been  explored  using  an  ANN  model.  For  comparison,  the 
performances  of  the  ANN  and  both  CBR  models  are  shown  in  Fig.  3  when  predicting  the 
outcomes  for  the  Duke  data  set.  The  solid  line  shows  the  results  for  CBR  when  the  Duke  data  set 
is  used  as  the  reference  database  while  the  dashed  line  shows  the  performance  when  the  Penn 
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data  set  is  used  as  the  reference  database.  The  dotted  line  shows  the  performance  of  the  ANN  for 
comparison.  Of  the  three  predictive  models,  CBR  using  the  Duke  reference  database  shows  the 
highest  performance  for  sensitivities  between  98  and  100%,  although  the  difference  is  very  small 
and  is  not  statistically  significant.  For  sensitivities  between  98%  and  75%,  the  ANN  provides 
higher  performance  than  either  of  the  CBR  models. 

The  matrix  in  Table  1  shows  the  predictive  performance  as  an  ROC  area  for  all  combinations  of 
the  Duke,  Penn,  and  the  combined  (noted  as  “Both”)  data  sets.  Each  testing  data  set  is  specified 
by  a  column  with  the  name  listed  in  the  first  row.  Each  reference  database  is  specified  by  a  row 
with  the  name  listed  in  the  first  column.  The  corresponding  ROC  area  is  located  at  the 
intersection.  There  is  essentially  no  difference  in  the  performance  for  predicting  the  outcomes  in 
any  of  the  three  data  sets  regardless  of  which  is  used  as  the  reference. 

Table  2.  presents  a  comparison  of  several  measures  of  the  predictive  performance  for  both  the 
Duke  and  Penn  testing  data  sets  using  both  the  CBR  and  the  ANN  models.  Here,  the  ROC  area  is 
compared  with  the  specificity  at  98%  sensitivity.  In  addition,  the  performance  is  presented  for 
this  threshold  setting  that  produces  98%  sensitivity  as  the  number  of  benign  biopsies  that  could 
have  been  spared  along  with  the  number  of  malignancies  that  would  have  been  missed. 

Conclusion 

From  the  study  just  described  we  can  conclude  that  the  data  sets  from  the  University  of 
Pennsylvania  and  from  Duke  University  are  equivalent  in  terms  of  similarity  in  the  distributions 
of  BIRADS  findings  and  their  relationship  to  the  likelihood  of  malignancy.  While  not 
quantitatively  analyzed,  it  seems  intuitively  obvious  that  there  could  have  been  differences 
between  the  patient  populations  from  these  data  sets.  With  the  set  from  U  of  Penn  (Philadelphia) 
representing  an  urban  population  and  that  from  Duke  (Durham)  representing  a  more  rural 
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population.  That  these  differences  were  not  seen  in  the  experiments  suggests  that  this  aspect  of 
patient  population  may  not  be  a  factor.  Particularly,  the  similarity  between  U  of  Perm  and  Duke 
would  suggest  that  the  predictive  model  described  here  is  relatively  insensitive  to  the  differences 
in  these  patient  populations.  These  results  are  supportive  of  the  conclusion  that  a  separate 
predictive  model  for  each  intended  local  may  not  be  required. 

In  conclusion,  3407  cases  were  acquired  from  5  institutions  geographically  distributed 
along  the  east  coast.  A  preliminary  evaluation  was  performed  to  examine  the  similarity  of  two  of 
the  data  sets  and  the  impact  of  any  differences  on  the  performance  of  a  case-based  reasoning 
system  for  the  prediction  of  biopsy  outcomes  from  mammographic  findings  reported  using  the 
BIRADStm  lexicon.  The  result  indicate  that  while  there  are  differences  in  the  distribution  of 
findings  and  their  relationship  to  the  likelihood  of  malignancy  as  predicted  using  the  CBR 
model,  the  CBR  is  robust  enough  to  that  its  predictive  power  is  minimally  affected  by  these 
variations.  Analysis  of  these  data  will  continue  and  will  be  submitted  for  publication  in  2001. 
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Duke  Test:  Effect  of  Reference  Data 
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Figure  l.The  ROC  plot  of  the  predictive  model  performance  for  predicting  the  outcome 
of  biopsy  when  the  Duke  data  set  is  used  as  the  testing  set.  The  solid  line  show  the 
results  when  the  Duke  data  set  is  used  as  the  reference  database  while  the  dashed  line 
shows  the  performance  when  the  Penn  data  set  is  used  as  the  reference  database.  The 
Duke  reference  database  shows  higher  performance  in  the  region  of  high  sensitivity. 
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Sensitivity 
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PennTest:  Effect  of  Reference  Data 
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Figure  2.The  ROC  plot  of  the  performance  for  predicting  the  outcome  of  biopsy  when  the  Penn 
data  set  is  used  as  the  testing  set.  The  solid  line  show  the  results  when  the  Duke  data  set  is  used 
as  the  reference  database  while  the  dashed  line  shows  the  performance  when  the  Penn  data  set  is 
used  as  the  reference  database.  The  Duke  reference  database  shows  higher  performance  for 
sensitivities  between  80  and  90%,  but  each  reference  database  provides  equally  good 
performance  for  sensitivities  from  90%  to  100%. 
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Sensitivity 


Duke  Test:  Effect  of  Reference  Data 
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Figure  3. The  ROC  plot  of  the  performance  for  predicting  the  outcome  of  biopsy  when  the  Duke 
data  set  is  used  as  the  testing  set.  The  solid  line  show  the  results  for  case-based  reasoning  when 
the  Duke  data  set  is  used  as  the  reference  database  while  the  dashed  line  shows  the  performance 
when  the  Penn  data  set  is  used  as  the  reference  database.  The  dotted  line  shows  the  performance 
of  an  artificial  neural  network  for  comparison.  Of  the  three  predictive  models,  case-based 
reasoning  model  using  the  Duke  reference  database  shows  the  highest  performance  for 
sensitivities  between  98  and  100%,  although  the  difference  is  very  small  and  is  not  statistically 
significant.  For  sensitivities  between  98%  and  75%,  the  artificial  neural  network  provides  higher 
performance  than  either  of  the  case-based  reasoning  models. 
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Testing  Data  set 
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Table  1.  The  matrix  in  Table  1  shows  the  predictive  performance  as  an  ROC  area  for  all 
combinations  of  the  Duke,  Penn,  and  the  combined  (noted  as  “Both”)  data  sets.  Each  testing  data 
set  is  specified  by  a  column  with  the  name  listed  in  the  first  row.  Each  reference  database  is 
specified  by  a  row  with  the  name  listed  in  the  first  column.  The  corresponding  ROC  area  is 
located  at  the  intersection.  There  is  essentially  no  difference  in  the  performance  for  predicting 
the  outcomes  in  any  of  the  three  data  sets  regardless  of  which  is  used  as  the  reference. 


Comparison  of  performance  of  CBR  with  ANN 
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Model 
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Table  2.  A  comparison  of  several  measures  of  the  predictive  performance  for  both  the  Duke  and 
Penn  testing  data  sets  using  both  the  CBR  and  the  ANN  models.  (CBR=case-based  reasoning, 
ANN=artificial  neural  network) 
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